169 research outputs found
Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics
Logs have been widely adopted in software system development and maintenance
because of the rich system runtime information they contain. In recent years,
the increase of software size and complexity leads to the rapid growth of the
volume of logs. To handle these large volumes of logs efficiently and
effectively, a line of research focuses on intelligent log analytics powered by
AI (artificial intelligence) techniques. However, only a small fraction of
these techniques have reached successful deployment in industry because of the
lack of public log datasets and necessary benchmarking upon them. To fill this
significant gap between academia and industry and also facilitate more research
on AI-powered log analytics, we have collected and organized loghub, a large
collection of log datasets. In particular, loghub provides 17 real-world log
datasets collected from a wide range of systems, including distributed systems,
supercomputers, operating systems, mobile systems, server applications, and
standalone software. In this paper, we summarize the statistics of these
datasets, introduce some practical log usage scenarios, and present a case
study on anomaly detection to demonstrate how loghub facilitates the research
and practice in this field. Up to the time of this paper writing, loghub
datasets have been downloaded over 15,000 times by more than 380 organizations
from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717
ROME: Testing Image Captioning Systems via Recursive Object Melting
Image captioning (IC) systems aim to generate a text description of the
salient objects in an image. In recent years, IC systems have been increasingly
integrated into our daily lives, such as assistance for visually-impaired
people and description generation in Microsoft Powerpoint. However, even the
cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and
algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect
captioning of important objects, misunderstanding, and threats to personal
safety. The existing testing approaches either fail to handle the complex form
of IC system output (i.e., sentences in natural language) or generate unnatural
images as test cases. To address these problems, we introduce Recursive Object
MElting (Rome), a novel metamorphic testing approach for validating IC systems.
Different from existing approaches that generate test cases by inserting
objects, which easily make the generated images unnatural, Rome melts (i.e.,
remove and inpaint) objects. Rome assumes that the object set in the caption of
an image includes the object set in the caption of a generated image after
object melting. Given an image, Rome can recursively remove its objects to
generate different pairs of images. We use Rome to test one widely-adopted
image captioning API and four state-of-the-art (SOTA) algorithms. The results
show that the test cases generated by Rome look much more natural than the SOTA
IC testing approach and they achieve comparable naturalness to the original
images. Meanwhile, by generating test pairs using 226 seed images, Rome reports
a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In
addition, we further utilize the test cases generated by Rome to retrain the
Oscar, which improves its performance across multiple evaluation metrics.Comment: Accepted by ISSTA 202
Cu2O@PNIPAM core–shell microgels as novel inkjet materials for the preparation of CuO hollow porous nanocubes gas sensing layers
There has been long-standing interest in developing metal oxide-based sensors with high sensitivity, selectivity, fast response and low material consumption. Here we report for the first time the utilization of Cu2O@PNIPAM core–shell microgels with a nanocube-shaped core structure for construction of novel CuO gas sensing layers. The hybrid microgels show significant improvement in colloidal stability as compared to native Cu2O nanocubes. Consequently, a homogeneous thin film of Cu2O@PNIPAM nanoparticles can be engineered in a quite low solid content (1.5 wt%) by inkjet printing of the dispersion at an optimized viscosity and surface tension. Most importantly, thermal treatment of the Cu2O@PNIPAM microgels forms porous CuO nanocubes, which show much faster response to relevant trace NO2 gases than sensors produced from bare Cu2O nanocubes. This outcome is due to the fact that the PNIPAM shell can successfully hinder the aggregation of CuO nanoparticles during pyrolysis, which enables full utilization of the sensor layers and better access of the gas to active sites. These results point out great potential of such an innovative system as gas sensors with low cost, fast response and high sensitivitH. J. gratefully acknowledges financial support of the CSC scholarship. S. P. acknowledges funding from the Community of Madrid under grant number 2016-T1/AMB-1695
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection
Anomaly detection in multivariate time series data is of paramount importance
for ensuring the efficient operation of large-scale systems across diverse
domains. However, accurately detecting anomalies in such data poses significant
challenges. Existing approaches, including forecasting and reconstruction-based
methods, struggle to address these challenges effectively. To overcome these
limitations, we propose a novel anomaly detection framework named ImDiffusion,
which combines time series imputation and diffusion models to achieve accurate
and robust anomaly detection. The imputation-based approach employed by
ImDiffusion leverages the information from neighboring values in the time
series, enabling precise modeling of temporal and inter-correlated
dependencies, reducing uncertainty in the data, thereby enhancing the
robustness of the anomaly detection process. ImDiffusion further leverages
diffusion models as time series imputers to accurately capturing complex
dependencies. We leverage the step-by-step denoised outputs generated during
the inference process to serve as valuable signals for anomaly prediction,
resulting in improved accuracy and robustness of the detection process.
We evaluate the performance of ImDiffusion via extensive experiments on
benchmark datasets. The results demonstrate that our proposed framework
significantly outperforms state-of-the-art approaches in terms of detection
accuracy and timeliness. ImDiffusion is further integrated into the real
production system in Microsoft and observe a remarkable 11.4% increase in
detection F1 score compared to the legacy approach. To the best of our
knowledge, ImDiffusion represents a pioneering approach that combines
imputation-based techniques with time series anomaly detection, while
introducing the novel use of diffusion models to the field.Comment: To appear in VLDB 2024.Code:
https://github.com/17000cyh/IMDiffusion.gi
- …